Ultra High Resolution (UHR)-IonStar is an MS1-based quantitative method for label-free proteomics experiments, devised to address issues related with quantitative precision, missing data, and false-positive discovery of protein changes in large-cohort analysis.
UHR-IonStar comprises of two parts: experimental procedures (left panel) and a proteomics data analysis pipeline (right panel). This manual provides a sample preparation protocol and also focuses on the data analysis pipeline part of UHR-IonStar, aiming at helping UHR-IonStar users to run the pipeline in their own computational environment.
The followings describe a general preparation protocol for UHR-IonStar. Details of the experimental procedures can be found in Shen et al. J Proteome Res. (2017) and An et al. Anal Chem. (2015)
Take 50mg (or 100mg if low protein yield) tissue for following analysis.
Weigh the tissue sample and record.
Prepare Lysis buffer with protease and phosphatase inhibitors (\(1\) tablet for \(10ml\) lysis buffer):
Add \(10\)x lysis buffer.
Vortex and spin.
For small amount of tissue: homogenize for 30s using pellet pestle.
For tough tissue (i.e. skin): Set the speed of polytron at \(15,000~rpm\) (between samples, use methanol, water, lysis buffer to wash the probe sequentially); Homogenize the sample for \(5\) seconds burst, cooling, repeat \(5\) times.
Put the samples on ice seating one hour.
Burst sonication \(8\) times with high energy sonicator (level \(14\)), cooling \(2\) seconds, repeat \(3\) times. Rinse the probe between samples.
Put the samples on ice seating one hour or overnight to allow thorough protein extraction.
Centrifuge at \(20000\)g, \(4^\circ C\) for \(30\) minutes. Prepare a new set of EP tubes. Prepare standards for BCA.
Take the supernatant (extracted proteins) to the new set of EP tubes and measure protein concentration using BCA.
Take \(100\mu g\) protein per sample out (according to eh protein concentration measured by BCA) and dilute the solution with \(0.5\%\) SDS to \(100\mu l\).
Prepare and add \(5\mu l\) DTT to each tube, vortex and spin, incubate at \(56^\circ C\) for \(30\)min.
Prepare and add \(5\mu l\) IAM to each tube, vortex and spin, incubate at \(37^\circ C\) for \(30\)min (IAM is light sensitive, make and use in the dark).
Add small volume (\(sample: acetone = 1:1\), around \(100\mu l\)) of chilled acetone (\(-20^\circ C\)) and vortex.
Add large volume (\(sample: acetone = 1: 4~to~5\), around \(500\mu l\)) of chilled acetone, vortex, then incubate at \(-20^\circ C\) for \(3\) hours or overnight.
Centrifuge the samples at \(20,000\)g for 30min at \(4^\circ C\), remove supernatant (absorb the liquid with Kimtech carefully, add around \(700\mu l\) methanol rinse the tube and discard methanol, spin, take out the rest of the liquid with loading tip).
Re-suspend the protein pellets in \(80\mu l\) Tris-FA (\(pH=8.5\)) buffer, sonicate gently to loosen the pellets.
Activate trypsin: add 85µl of Tris-FA (\(pH=8.5\)) buffer to \(20\mu g\) trypsin (Sigma-Aldrich). If using multiple tubes of enzyme, combine all the fractions together and mix well before adding to samples.
Add activated trypsin (\(20\mu l\)) in to the tubes with \(Substrate:Enzyme = 20\), incubate \(6\) hours or overnight at \(37^\circ C\).
Terminate the digestion by adding \(1\mu l\) FA (\(1\%\) v/v) to each tube, vortex, centrifuge at \(20,000\)g at \(4^\circ C\) for \(30\)min. Prepare a new set of sample vials.
Transfer about \(90\mu l\) of the supernatant to sample vial carefully for LC-MS analysis.
The primary software packages used in UHR-IonStar are SIEVETM and UHR-IonStar.
SIEVETM is a commercial software from Thermo Fisher Scientific. The latest version of SIEVETM is v2.2 SP2. Please contact Thermo Fisher Scientific regarding the quote for SIEVETM. To ensure of proper performance of SIEVETM, we recommend running SIEVETM on a PC with at least 16-core processors and at least 192 GB RAM.
R Shiny Web App UHR-IonStar , which is built by Dr.Qu's lab under R version 3.6.2 and R Bioconductor version 3.10, can be downloaded here.
Protein identification can be performed by any database searching engines and post-search processing tools. The final output is a so-called spectrum report containing PSMs from all sample runs passing the confidence threshold (e.g. FDR). The spectrum report can be exported from a number of software packages, e.g. Proteome Discoverer, Scaffold. Key information necessary for data integration include rawfile name and MS2 scan number. The file format of the spectrum report needs to be .csv.
The currently protein identification workflow used by our group features database searching by MS-GF+, post-search processing by IDPicker, and spectrum report generation by IonStarSPG.R. Detailed instructions can also be found.
Quantitative feature generation in UHR-IonStar is accomplished by SIEVETM v2.2 SP2 (Thermo Scientific), which integrates ChromAlign for global 3-D chromatographic alignment and a direct ion current extraction (DICE) method for feature extraction.
To start the quantitative feature generation analysis, open SIEVETM and select File -> Create new experiment. On the Designate Experiment Type page, select the Experiment Type based on the study. For a case-control experiment, use Two Sample Differential Analysis; for multi-condition experiment (3 or more conditions including control), use Control Compare Trend.
Drag all rawfiles into the Raw File Selection page.
For Two Sample Differential Analysis, assign Condition A and Condition B in the two boxes; For Control Compare Trend, put all conditions in the upper box and assign the control condition in the lower box.
Two Sample Differential Analysis:
Control Compare Trend:
A reference file also needs to be selected. In general, the reference file should provide the highest alignment scores for all sample runs.In most cases, it is recommended to start with a file in the middle of the LC-MS sequence as the reference.
The parameters that needs to be modified include Frame Time Width (min) and M/Z Width (ppm). The current setting is based on a 3-hr nano RPLC gradient with a Thermo Orbitrap instrument under 120K MS1 resolution. Manual optimization based on the LC-MS method may help to improve the performance of feature generation. All other parameters follow the default settings.
Check Generate all frames based upon all MS2 scan's retention times and precursor M/Zs to maximize the number of quantitatve features. Alternatively, users can assign Maximum Number of Frames and Peak Intensity Threshold.
After setting the method, finish the wizard and save the .sdb file.
For UHR-IonStar, users do not need to run the Identify process. In the SIEVE Parameters window, MaxThreads should be changed according to the configuration of the computer used for SIEVETM. For example, 6~8 threads are recommended for a PC with 16-core processors and 192 GB RAM. Occasionally, PCAProcess can also be disabled to alleviate computational burden. Click the Update button to save the settings. Run Align (ChromAlign) first.
Upon finishing, alignment scores for all sample runs will be shown in the Alignment tab. Ideally, the majority of sample runs should have an alignment score of >0.8 to ensure the quality of quantitative feature generation. Change the reference file and rerun the ChromAlign process if the alignment scores are subpar (e.g. <0.7) for a large portion of the files.
To change the reference file, click the "..." button in the Rawfiles line. Change the reference file by checking a new rawfile. Rerun Align and check the alignment scores again. When finished, run Frame to perform the DICE process.
After feature generation, the .sdb file will contain all quantitative features (i.e. frames) generated. For more detailed information about the use of SIEVE, please refer to SIEVE User Guide.
After protein identification and quantitative feature generation, the R Shiny web app UHR-IonStar will be utlized to integrate the spectrum report with the quantitative feature list and generate the final quantitative results. Procedures in this step include:
UHR-IonStar (Version 1.3.0) includes all former codes in IonStar_Run.R and IonStar_FrameGen.R in the first part called IonStarStat. If you want to have the oringinal code explanation, please go to supplementary part or check here for former source of IonStarStat.
UHR-IonStar is under R version 3.6.2, also need R Studio. And the R Bioconductor version is 3.10.
Please open 'Setup.R' to install all packages that the UHR-IonStar depends.
Some words might show in the R console: "Update all/some/none? [a/s/n]:", please input "a", or
"Do you want to install from sources the packages which need compilation? (Yes/no/cancel)" It is ok to choose 'no'.
The user should set the directory to where the UHR-IonStar is located, then install the package IonStarStat_0.1.4.tar.gz
After successfully installing all the packages, go to 'ui.R' or 'server.R' and click 'Run App' at the topleft of the script window to start.
Sometimes it might still produce error due to being unable to find target package even if the user installed all packages mentioned before. Please install the package the console or the pop-up webpage shows until you can see the UHR-IonStar interface. If all things looks good, the web brower would look like the following figure:
In the spectrum report, the rawfile name column (sp_col[1]) should only contain the file name with no extension (e.g. II_B03_21_150304_human_ecoli_A_3ul_3um_column_95_HCD_OT_2hrs_30B_9B), and the MS2 scan number should be numeric (e.g. 58143).
Click 'IonStarStat' then upload your files and use some options to generate the annotated frame list and the sample list, which are both required for subsequent protein quantification. The annotated frame list .csv generated consists of Protein accession number, Peptide sequence, Frame ID, and corresponding quantitative values in each sample.
Before running this part, please download the files previous generated and modify the sample list so that each sample is assigned a GroupID. GroupID can be any combinations of alphabetic and numeric symbols, e.g. A, Group1, 088714. There are several hidden data processing approaches embeded in this part. If you want to know more details, go to supplementary part to find code explanation.
Then re-upload the modified file into the web app, select data normalization, peptide aggregation method. Then choose if perform outlier rejection and peptide deconvolution.
For future data processing, the column “PepNum” in quantitative result dataset need to be removed. Also, if the row names contains many characters, we recommend to rename them to short ones for future convenience.
You also need to change the “Rawfiles” of Sample ID file (group file) corresponding to the row names in quantitative dataset. The order can be different. Note that the modified group file do not have number column in front.
Note that the characters in the first row have some restrictions: cannot use symbols (except "_"), numbers cannot be the initial of names (e.g. "9hTreat"). For protein results, the first column format can be "ProteinAccession:ProteinID" or just containing either of these two. For peptide results, the first column can be "ProteinAccession:ProteinID|PeptideSequence" or containing either of these first two plus the sequence split by "|". Note that if the datasets just have partial names of protein or peptide, you cannot perform results verification in the last part.
This part focuses on data analysis for quantitative results. It contains basic statistical testing, missing data and decoy removal for case-control study design.
Missing data are removed by a simple threshold that target protein has the sum of log intensities in every condition is less than 4. This setting is not shown in the interface.
“Decoys” mean some peptides sequence identified by reverse database, which means these are false positives because they should not exist but they are detected. Decoys are essential for calculating false discovery rates but useless doing quantification. So they are removed by a special name pattern. In the example you can see, all decoys are removed because we know their names contain 'XXX' in our reverse database.
After finishing data processing step, all results are saving in the system, which means you can straightly do data visualization part without uploading anything.
This section can visualize data by different ways with different purposes. Total six kinds of graph can be drawn carrying systematic information of quantification results:
Intra-group CV box plot: This plot shows the coefficient of variation for each group.
Intensity curve plot: This plot shows the average protein intensity of every subject, sorting decreasingly.
Inter-group correlation plot: This plot reveals the correlation between two group. Before plotting this, you should set two groups you want to see the correlation.
Pearson Correlation Matrix plot: This plot conveys the correlation information among each pair of observation group.
Plot for Principal Components Analysis: Use Principal Components Analysis (PCA) to cluster data in observation group level. Every observation group will be in the same color, and you can see the relationship among all groups based on the first two principal components.
Ratio Distribution plot: After choosing which group you want the plot to show, you can see the ratio distribution (selected group divided by the control group, multiple selection is available). The curve will be centered by the location of the maximum density after choosing ‘Correction’.
It is crucial to narrow down the research scale from the systematic data to the specific potential biomarkers.
After statistical testing, the proteins or peptides containing significant changes are found. Then there are four plots to show them quantitatively and functionally:
Volcano Plot: shows proteins up-, down-, and un- changed intensity with the different color.
Intensity curve with significantly changed proteins marked
Up-regulated and Down-regulated protein boxplot: set how many proteins shown in the plot first.
Gene Ontology makes a graph that shows biological processes involved with significantly changed proteins. We recommand the ID tpye for each protein is the Protein Accession Number. This plot may turn into error if the number of selected protein is too small.
This part focuses on double checking the quantification results by combining protein with peptide results. The quantity of protein is related to its peptide components. However sometimes, there is a huge intensity difference between protein and its dependent peptides, which could be considered as unreliable data.
User can perform “Data Processing” part for protein and peptide lists separately, then come to “Re-verification” part to get result integration and comparison.
The method for this part is to match the name of protein accession number between two lists. So before doing this part, user should change the list format a little bit:
The integration result is showing below:
v1.1.1:
Add ‘IonstarStat’ part.
Add the option ‘Correction’ when graph a ratio distribution plot.
v1.1.2:
Relax the limitation that each group must have the same amount of column. Replace ‘One-way AVOVA’ to ‘Pair t-test’.(Add function to graph Gene Ontology plot.
v1.2:
The quantification for peptides is available now.(Fixed a bug that cannot choose the size of the plot in Biomarker Discovery section.
v1.3:
The down-regulated plot can work normally.
Improve some interface details.
v1.4:
Add options of 'data normalization', 'peptide aggregation' in Protein Quantification part.
Peptide Deconvolution is available.
Improve some interface details.
For questions, suggestions, and other topics about UHR-IonStar, feel free to contact us:
Shuo Qian: sqian@buffalo.edu
Shichen Shen: shichens@buffalo.edu
Xue Wang: xwang79@buffalo.edu
Jun Qu: junqu@buffalo.edu